Closed Bug 1728747 Opened 4 years ago Closed 2 years ago

Randomly high CPU, typing lag, tons of TCP send / receive, from/to localhost for hours in remote desktop environment

Tracking

(Not tracked)

Status:

RESOLVED INCOMPLETE

People

(Reporter: duparchy, Unassigned)

References

(
URL
)

Details

(Keywords: perf)

Attachments

(5 files)

TB-TCP.png 4 years ago duparchy 107.17 KB, image/png		Details
TB-TCP2.png 4 years ago duparchy 133.40 KB, image/png		Details
99.99% chance that this process, which takes 6% of 14CPu permanently is lost in "localhost" loop 4 years ago duparchy 86.47 KB, image/png		Details
tcp-tcp4.png 4 years ago duparchy 158.07 KB, image/png		Details
tb.png 3 years ago duparchy 134.21 KB, image/png		Details

duparchy

Reporter

Description

•

4 years ago

Attached image TB-TCP.png — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0

Steps to reproduce:

Nothing special

Actual results:

Thunderbird has been eating the CPU for hours now.
Monitoring TB process with procmon (windows), Thunderbird seems to lost in a loop of TCP send / receive from/to localhost.

What's the point of those loopback transmit ?

duparchy

Reporter

Comment 1

•

4 years ago

In fact the very problem, maybe not related to those loopback connection, is that users are experience typing lags (again...)

Wayne Mery (:wsmwk)

Comment 2

•

4 years ago

Are these same users on RDS?

URL: http://forums.mozillazine.org/viewtop...

Flags: needinfo?(duparchy)

Keywords: perf

duparchy

Reporter

Comment 3

•

4 years ago

Hi, yes. That's my first attempt to understand the bug/feature before opening that case.

Flags: needinfo?(duparchy)

duparchy

Reporter

Comment 4

•

4 years ago

I manage 6 Windows 2019 RDSH servers and ~70 users.

This bug occurs randomly on different users, servers.

I will have to rervert back to TB 60 (again..).

Wayne Mery (:wsmwk)

Updated

•

4 years ago

Comment 5

•

4 years ago

Hi,
As far as I can tell from the user point of view, this is not related to 166881.

Here we have I think a bug, triggering a 20 years old questionable design (loopback TCP connexion).
This is resulting in a kind of DoS attack.

duparchy

Reporter

Comment 6

•

4 years ago

Beside the bug, the inter-process loopback connextion does not seems to be a true loopback to localhost.
Or is it just Process Monitor that translate "localhost" to the FQDN ?

Because true loopback connexions are supposed to be optimized for inter-process communications. See https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/hh997026(v=ws.11)

duparchy

Reporter

Updated

•

4 years ago

Summary: High CPU - tons of TCP send / receive, from/to localhost for hours. → High CPU - Typing lag - tons of TCP send / receive, from/to localhost for hours.

duparchy

Reporter

Comment 7

•

4 years ago

I've seen that TB 91 is now multi-process. Is there a chance that this inter-process communication through loopback (or pseudo-loopback) interface has been improved / re-designed ?

duparchy

Reporter

Comment 8

•

4 years ago

Attached image TB-TCP2.png — Details

Uploaded another capture of that tcp loopback flooding. Mind the 62% of the overall server I/O events. Looks like A true denial of service.
(Note that I checked with netstat -ano. This is a true loopback (127.0.0.1). procmon.exe translates it to the fqdn.)

duparchy

Reporter

Comment 9

•

4 years ago

Upgraded to TB 91.1.

This TCP loopback connection bug/feature persists.
On occasion a user's TB process will seemingly be going avoke.

duparchy

Reporter

Comment 10

•

4 years ago

This TCP loopback connection bug/feature didn't show up for days now. Maybe it's gone.

Monitoring TB activity with procmon I still see tons of registry query. All the same in a row. There's perhaps room for improvement here.

What's the point of those dozens of RegQueryValue :

HKLM\SOFTWARE\Microsoft\Input\InputServiceEnabledForCCI

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\OOBE\LaunchUserOOBE

duparchy

Reporter

Comment 11

•

4 years ago

One occurrence today of that seemingly infinite TCP Loopback connexions.
So it's still there...

duparchy

Reporter

Comment 12

•

4 years ago

Again...

I checked the user's settings.

Two IMAP accounts. Both accounts are set to not synchronize locally
No add-ons
No Global indexer.

duparchy

Reporter

Comment 13

•

4 years ago

Is there something I can do to help resolve this bug ?
Logging ?

duparchy

Reporter

Comment 14

•

4 years ago

For one user where this problem occurs frequently I've created a profile from scratch.
Problem NOT fixed.
This seems to be worst..... Instead of taking 6% (of 14vCPU... i.e ~85% of one CPU), I see TB process lost in loopback connexions reaching 11%

duparchy

Reporter

Comment 15

•

4 years ago

a second process taking 5% (of 14vCPU).
This creates problems when "real time" networking is required. Other users on Zoom , Teams etc.. are experiencing problem with audio/video.

Comment hidden (obsolete)

duparchy

Reporter

Comment 17

•

4 years ago

up.

Is there something I can do to help resolve this bug ?
Logging ?

duparchy

Reporter

Comment 18

•

4 years ago

Attached image 99.99% chance that this process, which takes 6% of 14CPu permanently is lost in "localhost" loop — Details

99.99% chance that this process, which takes 6% of 14CPu permanently is lost in "localhost" loop

duparchy

Reporter

Comment 19

•

4 years ago

This are TB process for 9 differents users up there.

Wayne Mery (:wsmwk)

Comment 20

•

4 years ago

Please try version 91 with Help > Troubleshoot mode

Flags: needinfo?(duparchy)

duparchy

Reporter

Comment 21

•

4 years ago

I already tested a newly created profile for a user (w/o extension).
So unless there's something else that I can diagnose in troubleshooting mode, I don't think it's worth the trouble to disturb a user.
As I said, this problem, as harmless as it looks, is in fact a kind of Denial Of Service and slows down the entire server.
This is not just me, this is randomly killing one server after another in an an entire RDSH farm infrastructure w/ 8 servers and 85 users.

Anyway, Thunderbird makes also way too much disk access for a cloud infrastructure using shared iSCSI or FC storage array.
Unless steps are done to improve that situation, we won't use it for long. This is sad.

Flags: needinfo?(duparchy)

duparchy

Reporter

Comment 22

•

4 years ago

TB 91.3. No improvement.

Two Thunderbird process for two different users accounting for 54% of all events on that server.

duparchy

Reporter

Comment 23

•

4 years ago

Attached image tcp-tcp4.png — Details

No improvement w/ 91.3

Wayne Mery (:wsmwk)

Comment 24

•

3 years ago

(In reply to duparchy from comment #7)

I've seen that TB 91 is now multi-process. Is there a chance that this inter-process communication through loopback (or pseudo-loopback) interface has been improved / re-designed ?

Not AFAIK, which your testing confirms. No idea where this traffic is coming from. Maybe Magnus has an idea.

Anyway, Thunderbird makes also way too much disk access for a cloud infrastructure using shared iSCSI or FC storage array.

Yes, this has been true for many years. Debilitating in some cases. Is there any possibility to put Thunderbird data on the server's local disk, which should help? (for example disk local on the hypervisor)

I mention this because there will be no relief coming from Thunderbird until the buffering issues are fixed by benc's refactoring and Bug 1121842 - [META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends.

Flags: needinfo?(mkmelin+mozilla)

Flags: needinfo?(duparchy)

duparchy

Reporter

Comment 25

•

3 years ago

Hi,

This idea behind a "cloud" infrastructure is that everything is backed-up at the storage array level.
Plus, there will be some level of high-availability (Live volume, etc...) .
Not only moving some data on local disks would be cumbersome (to edit everyone's Thunderbird to move her/his profile) but this is defeating the entire "Cloud" idea.
In addition that would be impossible when me move our private cloud to a Cloud provider (AWS, Azure etc..)

For good or bad, we are living the "Cloud" days, at least for professionals.
Developers should stop thinking that everyone sits beside his or her own brick-and-mortar height-core w/ 32G of RAM.

Thanks for trying to push the idea to whom it may concern.

And Thanks for you support.

Flags: needinfo?(duparchy)

Magnus Melin [:mkmelin]

Comment 26

•

3 years ago

No idea what would cause it, but probably dupe of bug 1732926. Try bug 1732926 comment 15 and report back there.

Flags: needinfo?(mkmelin+mozilla)

duparchy

Reporter

Comment 27

•

3 years ago

Yes I could try to disable the multi-process. But given the fact that feature/bug was present before TB 91 multi-process, I doubt it will have any effect.

Wayne Mery (:wsmwk)

Comment 28

•

3 years ago

Just to clarify ... this issue doesn't exist for you in version 68?
The port numbers are 55238, 55239, 52682, 52689?

(In reply to duparchy from comment #27)

that feature/bug was present before TB 91 multi-process

True, but it will at least remove one variable from the diagnosis process. Lest we forget about it, I suggest that it stay disabled until all your problems are resolved.

Flags: needinfo?(duparchy)

OS: Unspecified → Windows

Summary: High CPU - Typing lag - tons of TCP send / receive, from/to localhost for hours. → Randomly high CPU, typing lag, tons of TCP send / receive, from/to localhost for hours in remote desktop environment

duparchy

Reporter

Comment 29

•

3 years ago

Maybe it was simply unnoticed in TB 60 , but users didn't complain about performance and lags

Right now three persons in a raw on the same server with the "send-receive gone crazy "problem.

Flags: needinfo?(duparchy)

duparchy

Reporter

Comment 30

•

3 years ago

Attached image tb.png — Details

duparchy

Reporter

Comment 31

•

3 years ago

Still there. (Not checked w/ TB 100+ though)

Most of the time it goes unnoticed because we're on 10Gb network / 16 CPUs.

Up until high cpu/network loads (several Zoom in a raw. We're talking about a RDSH server w/ many users) reveals that underlying problem.

Wayne Mery (:wsmwk)

Comment 32

•

3 years ago

Reporter, does this still fail for you when using version 102 or newer version?

Whiteboard: [closeme 2022-11-15]

duparchy

Reporter

Comment 33

•

3 years ago

Hi,
I rolled it out to our RDSH servers last week so I can't tell for sure if the problem is gone.
TB 102 seems much more performant.
Definitely improved on I/Os.

Though, I still see some dubious I/Os through locahost TCP. But I've not seen any TB process going crazy so far.

Are there any information about underlying changes that would make us confident about the resolution of that problem ?

Phoenix

Comment 34

•

2 years ago

Resolved per whiteboard

Status: UNCONFIRMED → RESOLVED

Closed: 2 years ago

Resolution: --- → INCOMPLETE

Whiteboard: [closeme 2022-11-15]

duparchy

Reporter

Comment 35

•

2 years ago

Hi,

Here we go again.

Upgraded to 115.3.1
To help a user I explained him how to do a "repair folder".. and did it on my on Inbox (10K message).

It's now been 18h now that TB is eating my CPU on TCP Sends/Receives.

You need to log in before you can comment on or make changes to this bug.